This document is a report for on the pre-registered experiment https://osf.io/s6vam/. The experiment examined silent failures in a highly controlled setting. Each trial starts with automated steering. In some trials a bias is introduced that causes the vehicle to veer off the road either suddenly (within 1.5 s) or gradually (~ 4 s). Participants are required to keep within the road edges and intervene if they feel that it is necessary. They complete the steering task on two bend radii, sharp (40 m) or gradual (80 m), without distraction or with an easy or difficult distraction.
The bias is to yaw-rate, so introduces error by causing the vehicle to understeer or oversteer. Effectively, the manipulation biases the mapping between steering angle - which does not change - and yaw-rate, which does change. Therefore we will henceforth call this error Steering Angle Bias (SAB).
SAB is introduced in all trials, though in half the trials the SAB is not large enough to cause the vehicle to leave the road (though there will be some drift). It is confusing to call the ‘stay on road’ trials ‘no failure’ trial (or ‘None’ failure type). Instead, the levels of Failure Type will be Sudden (rapidly drifts off road, requiring intervention), Gradual (slowly drifts off road, requiring intervention), and Benign (may experience some drift but it does not require intervention).
The different Cognitive Load levels will be Hard (three targets), Easy (one target), or None (no heard letters).
Bend levels will be referred to by their radii, 40 m or 80 m.
This experiment stands as a lesson for the need to conduct a full technical pilot, running a participant through the entire experiment then putting that data through the entire analysis workflow. Due to time pressures we often do some piloting to ostensibly check data saving, condition indexing etc. But we do not very often take the time to process the data through analysis scripts. Here there were two errors in the data saving. First, the distraction performance files were overwritten when the driver was also steering (we still have the baseline - no steering - distraction performance files). Secondly, the “unique” filename for saving individual trials did not include the steering angle bias in the title. So, in effect these trials were overwritten and only the last six (randomised) trials were saved for each radii, irrespective of SAB.
This means we only have 20-30 % of the expected data (Fig 1). However, the amount of trials saved for each condition is random (for driving without distraction we have slightly more because there are two blocks), and 20-30% is still a reasonable chunk of trials (> 40 in each condition). So at the very least the reduced data set will be useful at tweaking the design for an improved re-run, and for developing the modelling architecture.
Fig 1. Amount of trials in each condition
We only have the cognitive performance for baseline (without steering). Here we plot three indices of performance to assess whether our levels of cognitive difficulty are reflected in performance measures: Percentage Correct (PC, True positives + True negatives / total letters heard); Reaction Time (RT, reaction time for True positives); Proportional Absolute Count Error (average distance from the true count, expressed as a proportion of the total amount of targets heard).
Fig 2. Cognitive Task Performance. A) Percentage Correct. B) Reaction Time. C-D) Absolute Count Error
Fig 2 shows that people respond differently to each Cognitive Load condition. Participants react slower to heard target letters if they are listening for three targets rather than only one target (Fig 2A). Participants rarely make mistakes (i.e. responding to a distractor letter or not responding to a target letter) in the Easy Cognitive Load condition, they make more mistakes in the Hard Cognitive Load condition (Fig 2B).
The interpretation of the the count error (Fig 2C) is more nuanced. The total error (cumulative across all targets) is very similar across both conditions. How you use then use this total count error to take into account that people are responding to more targets in Hard than Easy is the debateable bit. For example, estimating five instead of six occurences (for one target) has the same total error as estimating zero, two, three, for target counts of one, two and three (for three targets). Yet one could argue that the two responses are qualitatively different because in the second case the participant has entirely missed a target but got the other two spot on. To reflect this I’ve chosen to express this measure as a proportional of the total amount of targets heard (Fig 2C). In any case, people are only marginally less accurate on this measure in the Hard cognitive load than for Easy. It seems that the more precise measures of RT and PC are better indicators of differences in performance.
Therefore, a quick eyeball of the baseline cognitive task performance suggests that people reacted slower and experienced greater difficulty in deciding the appropriate response when there were three targets as compared to when there was only one.
We will now see how this gets reflected in steering measures.
The pre-registration specifies two hypotheses relating to taking over control of the vehicle:
H1: Participants will react slower to silent failures as cognitive load increases (i.e. more time will elapsed from the steering angle bias to the driver disengaging the automation).
H2: Participants will react with less aggression as cognitive load increases, bringing the vehicle back to the centre more slowly (i.e. they will be smoother and make less corrections).
These two hypotheses will be analysed in turn.
The pre-registration states that two measures will be calculated:
RT from SAB onset to disengagement of the system via a button press (can be negative if they take over before the SAB is introduced). I will refer to this measure as RTtakeover.
RT from SAB onset to first steering movement (RTsteer).
For the purposes of a ‘quick’ look at the pilot dataset I will only use RTtakeover for now.
Fig 3. RT takeover across failure types
Fig 3 shows massive differences in how quickly drivers disengage the automation after SAB onset (using positive RTs only). In the Benign condition drivers disengaged the system in 43 % of trials. Since drivers react at such different speeds across the three failure types the failure type conditions will be plotted separately.
Fig 4. RT takeover across cognitive loads
Fig 4 shows cognitive load differences in RTtakeover within failure type (using positive RTs only, collapsed across bend radii). For Sudden failures (leftmost in Fig 4) there are clear difference between None, Easy, and Hard. People switch to manual mode quickest when not performing a distractor task; they switch slowest when the distractor task is more difficult. However, note that the effect size here will be pretty small since the peaks are only about .1s apart (modes are 0.59 s, 0.65 s, and 0.72 s respectively) with SDs of approximately .2s. This difference is less pronounced for Gradual failures (middle) and Benign failures (right). In the Gradual Failure and Benign failure conditions the data is spread out over a wider range than in Sudden failures, so the depleted dataset causes problems for interpreting the shape of the distribution and the peaks may be unreliable. Nevertheless, for Gradual failures the median RT for No load seems to be earlier (2.21 s) than the Easy (2.52 s) and Hard (2.45 s) cognitive load conditions.
Fig 5, below, plots RTtakeover by bend radii for each failure type. Differences in RTtakeover between bend radii, if there are any, appear small.
Fig 5. RT takeover across bend radii
H1 Summary: of course this needs inferential statistics to support my eyeballing, but I will tentatively suggest that reaction time to silent failures does indeed increase as cognitive load increases, though this effect is small and depends on the severity of the failure.
The next section will examine the steering response after takeover.
Firstly, we plot some trajectories to understand the constraints the driver is placed under. To make the failures unpredictable there are 6 potential automation trajectories, with onset times of the SAB varying between 5 and 9. This variability makes for messy plots if plotting all trajectories in cartesian coordinates. Therefore, Fig 6 fixes the onset time (=5 s, blue star in Fig 6) and the bend radius (=40 m). You can see in Fig 6 that the potential automation trajectories all similarly hug the midline. It is also clear that the Sudden take-overs are tightly bunched compared to the Gradual and Benign (see also the RTs in Fig 4). Note that in the Gradual panel there is one trial where the take-over happens before the SAB is introduced. This happens in 3 % of trials.
Fig 6. Trajectories with a fixed onset time (at 5 seconds) and for bend radius of 40 m. Blue star is the time of onset. Dots are the time of take-over
Lane position allows for a better comparison of trajectories (Fig 7). Fig 7 plots the lane position immediately after switching to manual control. On average drivers seem to switch when lane position is closer to the midline in Sudden failures than in Gradual or Benign failures. However, due to the larger SAB drivers take longer to correct for the error in lane position. The fact that drivers tend to switch at a lower lane error in Sudden failures suggests that drivers may be responding to rate of change of lane position (i.e. Time to Lane Crossing, which is higher in Sudden failures) rather than lane position per se.
Fig 7. Steering bias for different failure types. Note that to avoid plotting, oversteer SAB for Benign failures are not shown
Fig 8 stratifies cognitive loads within failure types. From the average trajectories it appears that drivers tend to correct for errors more quickly when they are not cognitive loaded. This is particularly true for Sudden failures and to a lesser extent Gradual failures. There are not many trials included in the Benign plot so I would caution against drawing any averaged conclusions. Correcting for errors more slowly when cognitively loaded was a feature of Wilkie et al., 2019. However, in that experiment takeovers starting from the same lane position each time. In this experiment if drivers were slower to takeover more error would have accrued, so it is plausible that differences in lane position over time (Fig 8) could be partly due to RT differences between cognitive load conditions (see Fig 4).
Fig 8. Steering bias for cogload within different failure types
To disentangle whether the separation between average trajectories for ‘No Load’ and ‘Loaded’ conditions is due to less aggressive steering in the Cognitively loaded conditions we can take a look at the steering wheel angle plots over time (Fig 9). If the differences in lane position are only driven by delayed RTs we would expect the average SWA traces to be overlapping (i.e. on average, identical steering responses across cognitive load conditions irrespective of small changes in initial lane position). Conversely, we might also expect a reduced (in magnitude) or delayed steering wheel angle response in cognitive load conditions, as per Wilkie et al 2019.
Fig 9. SWA for cogload within different failure types. The dashed line is the point of takeover
In no conditions do we see a convincing evidence of a reduced or delayed steering response when cognitively loaded (Fig 9). In Benign and Gradual failures (right and middle, Fig 9) the steering response seems pretty similar across cognitive load conditions, suggesting any separation in the steering bias plot could be due to reaction times.
The Sudden failures (left, Fig 9) require more description. The first thing to note is that the recorded steering wheel angle reaches the maximum yaw-rate. The steering wheel angle is not the true angle, it is re-calculated from a [1, -1] steering wheel value, which hits a limit at 90 degrees. Therefore, for many of the trials participants could be turning the wheel over the 90 degrees mark, yet the inputted yaw-rate into the simulation would be capped. This clearly happens for Sudden failures where there is need for a sharp turn. Despite the wheel angle capping, it is interesting that experiencing Hard cognitive load seems to cause greater wheel turns than Easy and None. It is probable that this is not due to an effect of cognitive load on steering. Rather, it could be the combined effect of slower RTs causing greater steering demand which cannot be wholly compensated due to the capped steering wheel angle. Indeed, disentangling the indirect effects of cognitive load on steering (from increasing RTs and therefore increasing steering demand) with the direct effects of cognitive load on steering will be difficult, as steering demands correlate with RTs. The next plot examines this issue by plotting three measures of steering aggression (SWAmax, SWAvar, and SWAvel next to RTtakeover)
Fig 10. Steering aggression measures correlated with RT. A) Maximum Steering Wheel Angle, B) Standard Deviation of Steering Wheel Angle, C) Average steering wheel velocity
Fig 10 plots the maximum steering wheel angle (SWAmax, Fig 10A) and the standard deviation of steering wheel angle (SWAvar, Fig 10B), and the average steering wheel velocity (SWAvel) for the manual control period of driving. Only trials with at least three seconds of manual driving are plotted, since by this time most drivers have made their initial steering correction (see Fig 9). Trials can have a maximum of 15 seconds of driving. A higher SWAmax would mean that the driver executed a sharper turn. A high SWAvar is often considered a (bad) proxy for wiggly steering. The two measures are correlated since a higher SWAmax usually indicates a higher SWAvar. Steering wheel angle velocity is arguably a better measure for ‘jerkier’ steering as higher SWAvel values would mean a driver turned the wheel more rapidly, but in this particular dataset the measure is partly confounded by capping of the steering wheel angle, where velocity equals zero for the duration of time where the limit is hit.
Reassuringly, similar trends are seen across all measures, suggesting that in our experiment these higher values on all three measures are indicative of steering corrections that are executed with greater magnitude and speed. Here we will refer to such steering corrections as steering demand. Fig 10 is designed to disentangle the relative contribution of RTtakeover and cognitive load to steering demand. It is hypothesised that a positive correlation will exist between RTtakeover and the steering demand measures, since the later the RTtakeover the closer one is to lane boundaries. It is further hypothesised, based on the literature base, that increased cognitive load might cause a reduction in steering demand. If these two hypotheses did not interact, one would see a positive correlation of RTtakeover and steering demand, with vertically separated regression lines for each cognitive load condition. The different failure conditions are discussed in turn:
In Sudden failures there is a positive correlation on all measures. In SWAmax and SWAvel the correlation is weaker, presumably due to the capping of steering wheel angle limiting the maximum values for both measures. The coloured regression lines for separate cognitive load conditions are generally overlapping, and when they are not it seems to be due to outlying values (i.e. they are some slow responses in the Hard condition). Therefore, it seems that any change in steering demand cognitive load causing lower RTtakeovers rather than reducing the propensity to correct for errors.
In Gradual failures we observe a consistent positive correlation between RTtakeover and steering demand. It’s probable that the trend is easier to see than in Sudden because the rarely hit the clipping limit for Gradual failures (Fig 9), so the measures are less confounded. It is interesting that in all measures the regression line for None sits slightly above the regression lines for Hard and Easy, hinting at a direct effect of cognitive load on steering demand. However, such an effect appears to be small compared to the impact of taking over control later.
Note that in Benign failures there are not many trials plotted because trails were excluded if they contained less than three seconds of manual control. Nevertheless, taking over later does not appear to result in greater steering demand. Remember that the Benign failures do not require a takeover, so it is possible that drivers sometimes switch to manual and do not make any corrections. Nor does there appear to be differences between cognitive load conditions in steering demand.
H2 Summary: I will tentatively suggest that there the direct effect of cognitive load on the aggression of the first steering response is not large enough to pay serious attention to. I propose that the speed of response (which seems to fluctuate with cognitive load) is a stronger determinant of the strength of the first steering response.
Fig 11. A) Lane position and RT takeover. B) TTLC and RT takever
However, the magnitude of steering corrections are not solely determined by how quickly drivers respond. Error develops at different rates in different conditions, so an identical RTtakeover across both Gradual and Sudden failure types results in different lane positions (Fig 11A) and urgency (Fig 11B) at takeover. Therefore, if one plots SWAmax against RTtakeover, stratified by failure type, there are clear differences between the conditions (Fig 12A). If steering magnitude was driven solely by speed of response, we would expect both failure types to ‘collapse’ along the same dimension. Fig 12B (SWAmax by lane position and Fig 12C (SWAmax by TTLC) seem to be closer to achieving a single dimension which determines response across conditions, yet there is still reasonable stratification across failure type conditions.
Fig 12. A) SWA max by RT and Failure type. B) SWA max by TTLC and Failure Type
So far we have examined the speed of takeover and aggression of the initial steering response. This varies massively across failure type conditions, and less so during differences in cognitive load. However, the SAB magnitudes used in our conditions is fairly artificial (though the TTLCs were based on the literature), and we are not particularly interested whether there are differences between our categorical conditions. We are more interested in what perceptual information people are attending to that means they react quicker and with greater magnitude of response in one condition over the other.
With this in mind we proceed to conduct regression analysis with variables from the scene as continuous predictors, rather than with failure_type as categorical predictors. In the first instance the RTtakeover is predicted from the state of the world at .25 s before the switch to manual mode (to allow for execution delays). Two variables are chosen for a first pass: lane position and change in lane position.
## [1] "Sudden" "Gradual" "Benign"
## [1] "Easy" "Hard" "None"
In the pre-registration there were three more hypotheses. Here I briefly list what we now know.
Not looked at yet, will look at if I have time before hols, but do not clearly see how this will change the design?
For sudden failures RTtakeover appears to be stratified somewhat by Easy or Hard load. However, this effect is small, and smaller still (or nonexistent) in the less severe failures.
There does not seem to be an interaction jumping out for bend radii. As stated above the small effects of cognitive load are more pronounced for silent failures.